Elasticsearch Query DSL Syntax Notes
TLDR
- Query DSL Advantages: Compared to Query String, DSL supports nested queries, geospatial queries, custom scoring (Function Score), and more complex boolean logic, while offering a clear structure and precise error messages.
- Match Query: The core of full-text search. The
operatorparameter controls logic (OR/AND),minimum_should_matchsupports flexible count and percentage rules, andfuzzinesssupports fuzzy matching. - Multi Match Query: Provides modes such as
best_fields(default, takes the highest score),most_fields(sums scores), andcross_fields(treats multiple fields as a whole), suitable for multi-field searching. - Combined Fields Query: Term-centric; treats multiple
textfields as a single combined field, suitable for cases where keywords are scattered across titles, summaries, and body text. - Range Query: Handles numeric and date ranges. For date queries, it is recommended to explicitly define the
formatand prioritize string formats to avoid values being parsed as millisecond timestamps. - Nested Query: Must be used when the field type is
nested. It preserves the relationship between fields within array elements, preventing query errors caused by the flattening ofobjecttypes. - Performance Warning:
wildcardandregexpqueries have poor performance. Avoid using leading wildcards and limitmax_determinized_statesto prevent resource exhaustion.
Query DSL vs Query String
In production environments, Query DSL is the recommended choice. Compared to Query String, it offers the following advantages:
- Functional Completeness: Nested queries, geospatial queries, custom scoring (
function_score), and complex boolean logic combinations can only be implemented via Query DSL. - Clear Structure: The JSON structure clearly defines query types and parameters, making it easy to maintain and debug, with error messages that precisely point to the problematic field.
Common Query DSL Syntax
1. Match Query - Full-Text Search
Used for full-text search; it performs tokenization and relevance scoring.
- operator parameter: Controls the logical relationship between multiple tokens.
ORis the default;ANDrequires the document to contain all terms. - minimum_should_match: Only effective when
operator = "OR". Supports positive integers (absolute count), negative integers (allowed missing count), percentages (rounded down), and conditional combinations (e.g.,3<90%). - fuzziness: Only applicable to
textfields. It is recommended to set it toAUTOto let Elasticsearch automatically determine the edit distance based on term length. - lenient: Defaults to
false. Setting it totrueallows ignoring fields when types do not match, preventing the query from throwing errors. - zero_terms_query: When no tokens remain after analysis,
none(default) returns no results, whileallreturns all documents.
2. Multi Match Query - Multi-Field Search
Searches for the same keyword across multiple fields.
- best_fields (default): Takes the highest scoring field; suitable for finding the "best match" in a single field.
- most_fields: Sums the scores of all fields; suitable for scenarios with "multiple similar fields."
- cross_fields: Treats multiple fields as one large field; suitable for queries like names or addresses that span across fields.
- phrase / phrase_prefix / bool_prefix: Special query types for phrases and prefixes, suitable for autocomplete or exact phrase searches.
3. Combined Fields Query - Cross-Field Term Query
Uses a term-centric approach, treating multiple text fields as a single combined field.
- Limitations: All fields must be of
texttype and use the samesearch_analyzer. - Advantages: Performs exceptionally well when keywords are scattered across multiple fields (e.g., title, summary, body).
4. Match Phrase Query - Phrase Query
Requires terms to appear in the specified order.
- slop parameter: Allows the maximum number of gaps between terms; defaults to
0(must be perfectly adjacent).
5. Term and Terms Query - Exact Match
- Term: Used for exact value queries; no tokenization is performed. When used on
textfields, it matches against the tokenized terms. - Terms: Similar to SQL's
INquery. Supports Terms Lookup, which can retrieve field values from existing documents to use as search criteria.
6. Range Query - Range Query
Used for numeric and date ranges.
- Date Handling: It is recommended to explicitly specify the
format. If mixing numbers and strings, numbers will be interpreted as millisecond timestamps, leading to parsing errors; it is recommended to use string formats consistently. - Date Math: Supports operations like
now,+1h,-1d, and can be combined with rounding operations (e.g.,/d,/M) using||.
7. Exists Query - Field Existence Query
Queries whether a field exists (is not null).
- Inverse Query: Use a
boolquery combined withmust_notandexists. - Notes: If a field is set to
index: false,doc_values: false, or exceeds theignore_abovelimit, the field cannot be detected by an exists query.
8. Prefix, Wildcard, and Regexp Query
- Prefix: Queries documents starting with a specific string.
- Wildcard: Uses
*and?for fuzzy queries. Avoid using leading wildcards (e.g.,*term) to prevent full table scans. - Regexp: Supports regular expressions. Performance is the worst; avoid whenever possible. The Lucene engine does not support
^and$anchors; regular expressions match the entire string by default.
9. Fuzzy Query - Fuzzy Query
Fault-tolerant query that allows for spelling errors.
- Recommendation: For
textfields, prioritize using thematchquery with thefuzzinessparameter instead of thefuzzyquery directly to ensure the query terms are processed by the analyzer.
10. Nested Query - Nested Object Query
Used to query nested type fields, preserving the internal relationships of array elements.
- Problem Scenario: If using the
objecttype, arrays are flattened, causing the loss of relationships between elements during queries (e.g., "John gave 5 stars" might be incorrectly matched as "John gave 3 stars"). - Solution: Define the field as a
nestedtype and use thenestedquery to ensure conditions match "within the same sub-document." - inner_hits: This parameter can be used to retrieve the specific nested objects that matched the criteria, rather than just returning the parent document.
Change Log
- 2025-11-04 Initial document created.